Take-Home Exercise 2
The aim of this exercise is to design an age-sex pyramid based data visualisation to show the changes of demographic structure of Singapore by age cohort and gender between 2000-2020 at planning area level.The data used is Singapore Residents by Planning Area / Subzone, Age Group, Sex and Type of Dwelling, June 2000-2010 and Singapore Residents by Planning Area / Subzone, Age Group, Sex and Type of Dwelling, June 2011-2020 obtained from Singstat.
I would like to explore the following visualisations:
Install the necessary R packages - ‘tidyverse’, ‘readxl’, ‘knitr’, ‘ggplot2’, ‘plotly’,‘DT’,‘patchwork’,‘gganimate’,‘gifski’,‘gapminder’,‘ggiraph’.
packages = c('tidyverse', 'readxl', 'knitr', 'ggplot2', 'plotly','DT','patchwork','gganimate','gifski','gapminder','ggiraph')
for(p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
As the 2000-2010 data and the 2011 - 2020 data are in two separate files, we will first need to import them using read_ccv() and append them together using rbind().
pop_2000to2010<-read_csv("data/respopagesextod2000to2010.csv")
pop_2011to2020 <-read_csv("data/respopagesextod2011to2020.csv")
pop_combined<- rbind(pop_2000to2010,pop_2011to2020)
Sum the population after grouping by planning area, year, age group and gender.
Pop_grp <- pop_combined %>%
group_by(PA, Time, AG,Sex) %>%
summarise(`Total Pop`=sum(Pop))%>%
ungroup()
Pop_grp
# A tibble: 43,890 x 5
PA Time AG Sex `Total Pop`
<chr> <dbl> <chr> <chr> <dbl>
1 Ang Mo Kio 2000 0_to_4 Females 4460
2 Ang Mo Kio 2000 0_to_4 Males 4830
3 Ang Mo Kio 2000 10_to_14 Females 5520
4 Ang Mo Kio 2000 10_to_14 Males 5710
5 Ang Mo Kio 2000 15_to_19 Females 5930
6 Ang Mo Kio 2000 15_to_19 Males 6260
7 Ang Mo Kio 2000 20_to_24 Females 7160
8 Ang Mo Kio 2000 20_to_24 Males 7170
9 Ang Mo Kio 2000 25_to_29 Females 7750
10 Ang Mo Kio 2000 25_to_29 Males 8260
# ... with 43,880 more rows
Split the age group to starting age and ending age of the age group using separate() function in dplyr package and convert them to numeric.This is to enable the sorting of age group by starting age later.Split the Age group to the starting age and ending age of the age group using separate() function in dplyr package and convert them to numeric.This is to enable the sorting of age group by starting age later.
Pop_grp2<-separate(data=Pop_grp,col = AG,into=c("Starting_Age","Ending_Age"),
remove = FALSE, sep = "_to_")
Pop_grp2$Starting_Age<-as.numeric(Pop_grp2$Starting_Age)
Pop_grp2$Ending_Age<-as.numeric(Pop_grp2$Ending_Age)
Attempt to draw an animated Age-Sex Pyramid by filtering out 1 planning area first.The following chart shows how the population varies in Ang Mo Kio planning area from 2000 to 2020.
Pop_filter<-Pop_grp2 %>%
filter(PA=="Ang Mo Kio")
ggplot(Pop_filter, aes(y = reorder(AG,Starting_Age), fill= Sex,
x = ifelse(test = Sex =="Males",
yes = -`Total Pop`, no = `Total Pop`)))+
geom_col(stat="identity")+
scale_x_continuous(labels = abs, limits = max(Pop_filter$`Total Pop`) * c(-1,1))+
labs(title = 'Age-Sex Pyramid for Ang Mo Kio Population in:{as.integer(frame_time)}', x="Population", y ="Age Group")+
transition_time(Time)+
ease_aes('linear')
Try adding tooltip using the ggiraph package with data for a particular year and planning area. Geom_col_interactive() and girafe() are used to create this interaction.
Pop_filter<-Pop_grp2 %>%
filter(PA=="Ang Mo Kio",Time=="2000")
Pop_filter$tooltip<-c(paste0("Planning Area= ",Pop_filter$PA,
"\n Gender= ", Pop_filter$Sex,
"\n Pop = ", Pop_filter$`Total Pop`))
q<-ggplot(Pop_filter, aes(y = reorder(AG,Starting_Age), fill= Sex,
x = ifelse(test = Sex =="Males",
yes = -`Total Pop`, no = `Total Pop`)))+
geom_col_interactive(tooltip=Pop_filter$tooltip)+
scale_x_continuous(labels = abs, limits = max(Pop_filter$`Total Pop`) * c(-1,1))+
labs( title = 'Age-Sex Pyramid for Ang Mo Kio Population in 2000',
x="Population", y ="Age Group")
girafe(
ggobj = q,
width_svg = 8,
height_svg = 8*0.618
)
Filter raw data for the selected planning areas that we are keen to explore instead of drawing the age-sex pyramid for all planning areas to prevent cluttering of graphs. In this case, I have selected Ang Mo Kio & Bishan vs Punggol to observe how the population varies in more mature areas and less mature area across the years. facet_wrap() is used to show the charts split by the selected planning areas.
Pop_filter2<-Pop_grp2 %>%
filter(PA=="Ang Mo Kio"|PA=="Bishan"|PA=="Punggol")
r<-ggplot(Pop_filter2, aes(y = reorder(AG,Starting_Age), fill= Sex,
x = ifelse(test = Sex =="Males",
yes = -`Total Pop`, no = `Total Pop`)))+
geom_col(stat="identity")+
scale_x_continuous(labels = abs, limits = max(Pop_filter2$`Total Pop`) * c(-1,1))+
labs(title = 'Age-Sex Pyramid for SG Population in:{as.integer(frame_time)}',
x="Population", y ="Age Group")+
transition_time(Time)+
ease_aes('linear')
r+theme(aspect.ratio = 1)+facet_wrap(~PA, scales = "free_x")
Attempted to use crosstalk together with plotly to design an age-sex pyramid that changes based on the year and planning area filter. However, the filter function only shows the position of data point after all data points are plotted instead of filtering the raw data to draw a new plot.
After researching online,this could be due to the inherent limitations of crosstalk that “Crosstalk currently only works for linked brushing and filtering of views that show individual data points, not aggregate or summary views (where “observations” is defined as a single row in a data frame)."
It is definitely more challenging to use R in creating the filters vs creating similar visualisations using Tableau.
library(crosstalk)
SharedPop<-SharedData$new(Pop_grp2)
p<-ggplot(SharedPop, aes(y = reorder(AG,Starting_Age), fill= Sex,
x = ifelse(test = Sex =="Males",
yes = -`Total Pop`, no = `Total Pop`)))+
geom_col()+
scale_x_continuous(labels = abs)+
labs(title = 'Age-Sex Pyramid for SG Population', x="Population", y ="Age Group")
bscols(widths=c(2,NA,NA),
list(
filter_select("PA","Planning Area", SharedPop,~PA,multiple = FALSE),
filter_select("Time","Year", SharedPop,~Time,multiple = FALSE)
),
ggplotly(p)
)